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A nonlinear dynamics approach can be used in order to quantify complexity in written texts. 
As a first step, a one-dimensional system is examined : two written texts by one author (Lewis 
Carroll) are considered, together with one translation, into an artificial language, i.e. Esperanto 
are mapped into time series. Their corresponding shuffled versions are used for obtaining a "base 
line" . Two different one-dimensional time series are used here: (i) one based on word lengths (LTS), 
(ii) the other on word frequencies (FTS). ft is shown that the generalized Hurst exponent h(q) and 
the derived f(a) curves of the original and translated texts show marked differences. The original 
"texts" are far from giving a parabolic f(a) function, - in contrast to the shuffled texts. Moreover, 
the Esperanto text has more extreme values. This suggests cascade model-like, with multiscale time 
asymmetric features as finally written texts. A discussion of the difference and complementarity of 
mapping into a LTS or FTS is presented. The FTS f(a) curves are more opened than the LTS ones 

PACS numbers: 89.75.Fb, 89.75.Da,05.45.Tp,89.75.Kd 



I. INTRODUCTION 

The Hurst (or equivalently Holder) exponent [TJ , mea- 
suring the so called self affinity of signals, in short the 
roughness exponent, can be generalized to some general- 
ized fractal dimension D [TJ [3j • However, multifractals 
[3] seem to better describe an object through its evolving 
geometrical or structural features. One has to recognize 
that there is some debate on whether multifractality ex- 
ists because of finite size effects [4]. The discussion on 
such a point should arise in some review article, outside 
the present paper. Let it be simply recalled that through 
a generator and from an initiator, one can easily produce 
a fractal object with a given dimension pQ. Note that 
to produce realistic and meaningful multifractal models 
is still a challenge [5]. Next, one can ask "what to do 
with the knowledge that a dynamical object is a multi- 
fractal?"; even more: "How can this nonlinear measure 
of knowledge be useful?". Nevertheless, the first question 
is "Is there any multifractality evidence?". 

Many authors have discussed the origin, characteris- 
tics, content, role of multifractals. Let me point out to a 
pioneering experimental one [5], a theoretical [5], a con- 
ceptual one [7j, and a few so called applications [8HT0] 
in order to set-up some wide perspective. Let us also 
recall that one has to obtain a h(q) function which is a 
generalized Hurst or Holder exponent or a D(q) gener- 
alized dimension, where q represents the degree of some 
moment distribution of some time evolving variable. Sub- 
sequently one can obtain a /(a) spectrum, in which /(a) 
is the distribution of the exponent a(= j^[qh(q)}) of the 
object. 
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A written text can be considered as a physical sig- 
nal [Til H2] j because it can be decomposed through level 
thresholds which are like a set of characters taken from 
an alphabet. As such, writings, belong to the top level 
class of complexity 13J . One question immediately fol- 
lows : are multifractals found in real texts? - a question 
already raised in [2] when studying the distribution of 
letters in Moby Dick; see also [T5rtT8] . 

In [Jj3 [5D] it was claimed that long range order correla- 
tions (LROC) between words in texts express an author's 
ideas, and in fine even consist in some author's signature 
[32 [25]. Comparisons of written texts translated from 
one to another language [23 , in particular from the point 
of view of word LROC, are of interest from the complex- 
ity point of view. The more so if the number of words 
in two languages is markedly different. In fact, since 
Shannon himself |24j . writings and codings are of inter- 
est in statistical physics. Writings are systems practically 
composed of a large number of internal components (the 
words, signs, and blanks in printed texts). 

Texts, used here for investigating some a priori un- 
known structure, were chosen for their rather wide dif- 
fusion and incidentally being representative of a famous 
scientist, Lewis Carroll, i.e. Alice in wonderland (AWL) 
[251 I26| and Through a looking glass (TLG) |27j . Know- 
ing the mathematical quality of this author's mind, one 
might expect to find some special, unusual, unknown fea- 
tures of his texts. Interestingly, a translation of AWL 
into Esperanto is available on internet; here below, such 
a text will be referred to as ESP. 

Having no previous baseline for such investigations, the 
three texts have been shuffled in order to serve as base 
line. This should allow to check the robustness of the 
investigation methods and, if they exist, findings about 
multifractality of such written texts.. 

In Sect. [TIJ the data downloading and preliminary ma- 
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nipulations are explained. Next, the methodology is ex- 
posed: one can distinguish frequency time series (FTS) 
from length time series (LTS). Different techniques ex- 
ist to investigate such supposedly multifractal signals. 
Those are briefly recalled for completeness. Such tech- 
niques are complementary; the presently used one sticks 
to the classical box counting method [3J. The resulting 
data does not show any anomaly that would put into 
question the simplest method, and would request more 
fancy or advanced techniques. 

More importantly, in the author's opinion, one has to 
remain within a statistical physics framework. In order 
to do so, one aim consists in searching for correlations 
between fluctuations, in the spirit of the linear response 
theory [2E1I2H]- Thus, the 12 time series are transformed 
into "fluctuations", i.e. series based on the signs of the 
"derivatives" of the texts (!), before calculating the mul- 
tifractal features. 

In Sect. |III| the results for the generalized Hurst ex- 
ponent h{q) and the corresponding f(a) function [5] are 
presented and discussed. In Sect. |IV[ one comments 
about indicators, i.e, the shape and extreme values of 
h(q), a, and /(a) characterizing the texts. Those sug- 
gest how to analyze (dis)order and correlations, whence 
so called text complexity, along cascade-like models [30] . 
with multiscale time asymmetric features. 

In Sect. [VJ a summary induces a conclusion. 

II. DATA AND METHODOLOGY 

The time series are made from a mapping of texts, 
here above mentioned, downloaded from a freely avail- 
able website [31] • The chapter heads have first been re- 
moved before analysis. Three files are considered : (i) the 
English version of AWL, - in short AWL; (ii) its trans- 
lation into Esperanto, - in short ESP; and (iii) and the 
chronologically later written (English) text TLG. Note 
that even though the series are to be transformed, see 
below, the same notation is kept thereafter, referring as 
such to the original (o) text without any ambiguity or to 
their shuffled version (s), i.e. AWLo, TGLs. 

The shuffle algorithm is one found on Wikipedia. In 
brief, the first data point is exchanged with some fol- 
lowing one, its location chosen from a generated random 
number. The second data point is exchanged with some 
following one, chosen from another random number, etc. 
The random number generator was checked to lead to a 
rather uniform distribution, for a number between and 
1. The algorithm was applied ten times on the texts to 
get the final shuffled texts hereby used for analysis, com- 
parison, and discussion. In so doing, the 6 documents 
have been transformed into 12 numerical one-dimensional 
nonlinear maps in two ways [3 2) : (i) by counting the 
number of occurrences of each word in the whole docu- 
ment, deducing its frequency /. The words are ranked 
accordingly, giving rank 1 to the most frequent word. 
Then, the text is "rewritten" into a series of numbers, 





AWLo 


ESPo 


TLGo 


Number of words 


27342 


25592 


30601 


Number of different words 


2958 


5368 


3205 


Number of characters 


144927 


154445 


164147 


Number of " sentences" 


1633 


2016 


2059 


Number of punctuation marks 


4531 


4752 


4828 



TABLE I: Basic statistical data for the three original texts of 
interest. The number of words gives the size of the "length 
time series" . The number of different words gives the size of 
the "frequency time series" 



such that at each appearance of a word a number equal 
to its rank is replacing the word. Such a series is called 
the frequency time series (FTS); (ii) by considering the 
length I (number of letters) of a word. One records the 
word of length I at each successive "time" in the docu- 
ment, i.e. the first word is considered to be emitted at 
time t= 1, the second at time t = 2, etc. A time series 
based on the amplitude l(t) is so constructed. It is called 
a length time series (LTS). 

Let it be mentioned that punctuations and other ty- 
pological signs are disregarded: e.g., a "word" like don't 
is considered as leading to "don" , - 3 letters, and "t", - 1 
letter. The same goes on for singular and plurals, giving 
two distinct words, or verbs. For completeness, let it be 
mentioned that the frequency, for example, of only lem- 
matized nouns or verbs could be studied [33J 132] ■ Note 
that it should be obvious that the above mappings lead 
to a continuous-like series, i.e. without blanks or gaps 
between words, now being numbers or a time index. 

There are several techniques to demonstrate multifrac- 
tality in time series, as nicely and recently reviewed in 
[34] or by Schumann and Kantelhardt [35.. Although the 
multiscaling features can be studied using different algo- 
rithms, each method provides a complementary informa- 
tion about the complex structure of the time series. 

One can be analyzing cither the statistics or the geom- 
etry, as well described in [36] . 

A statistical approach consists of defining an appro- 
priate intensive variable depending on a resolution pa- 
rameter, then its statistical moments are calculated by 
averaging over an ensemble of realizations and at ran- 
dom base points. It is said that the variable is multifrac- 
tal if those moments exhibit a power-law dependence in 
the resolution parameter. On the other hand, geomet- 
rical approaches [37-42; try to assess a local power-law 
dependency on the resolution parameter for the same in- 
tensive variables at every particular point. The geomet- 
rical approach is informative about the spatial localiza- 
tion of self-similar (fractal) structures, but leads to some 
difficulty when having to justify the retrieval of scaling 
exponents. 

The oldest multifractal analysis method is the multi- 
fractal box counting (MF-BOX) technique 3J which fails 
in presence of non-stationarities, such as trends. This 
deficiency led to the development of the wavelet trans- 
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FIG. 1: The so called partition function x( s , l) vs - s i t ne 
sub-series size in Eq. (1), on log-log plot graphs, in order to 
obtain r(g), Eq. (3), in the best possible power law regime, see 
text, and subsequently the generalized Hurst exponent h(q), 
Eq. (4), or the generalized fractal dimension D(q), Eq. (5), 
in the case of FTS for the (left) original and (right) shuffled 
texts. Only 3 representative (/-values (-10, +2, +20) in each 
case are shown for space savings. Obvious notations to under- 
stand the illustrating data are on the left axis. In the display, 
the data has been arbitrarily displaced along the y-axis since 
only the slope from a linear fit is relevant 



FIG. 2: The so called partition function x( s , <l) vs - s i the 
sub-series size in Eq. (1), on log-log plot graphs, in order to 
obtain r(q), Eq. (3), in the best possible power law regime, see 
text, and subsequently the generalized Hurst exponent h(q), 
Eq. (4), or the generalized fractal dimension D(q), Eq. (5), in 
the case of LTS for the (left) original and (right) shuffled texts. 
Only 3 representative q-values (-10, +2, +20) in each case are 
shown for space savings. Obvious notations to understand the 
illustrating data are on the left axis. In the display, the data 
has been arbitrarily displaced along the y-axis since only the 
slope from a linear fit is relevant 
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FIG. 3: Generalized Hurst exponent of three original texts FIG. 4: Generalized Hurst exponent of three shuffled texts 
analyzed through FTS (top) and LTS (bottom) mapping: analyzed through FTS (top) and LTS (bottom) mapping: 
AWL: red, ESP: blue, TLG: green dots ' AWL: red, ESP: blue, TLG: green dots. 
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FIG. 6: /(q) for ESP, original (o) or shuffled (s) text along 
FTS or LTS mapping 



form modulus maxima (WTMM) method a generalized 
box counting approach based on a wavelet transform by 
Muzy-Bacry-Arneodo, as long ago as 1991 [3514T7] . An- 
other approach to study multifractality in time series is 
the multifractal generalization of detrendcd fluctuation 
analysis (MF-DFA) of which Kantclhardt ct al., on one 
hand, and Zunino et al., on the other hand [^8H5Tj are the 
most prolific represntatives. It based on the traditional 
DFA [52] or extensions [53j [54] . 

Practically, MF-DFA is a less complicated and de- 
mands less presumption than the WTMM algorithm. For 
comparisons of these multifractal analysis methods, see 
[151 l5"5"H5"7j . Such comparisons indicate that MF-DFA is 
at least equivalent to WTMM, while an application of 
WTMM needs more care and yields spurious multifrac- 
tality more often. In the present case, since there is no 
trend in such series, the simplest box counting technique 
is workable. Thus, the present study sticks to the classi- 
cal box counting method [3]. 



III. RESULTS 

A. Multifractal Analysis 

The simplest type of multifractal analysis, based upon 
the standard partition function multifractal formalism 
[3J, is summarized here below. However, it is relevant 
to emphasize which variables are used in calculating the 
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partition function. Since I want to stick to statistical 
physics ideas and methods, through, the usual "Linear 
Response Theory" concepts [35] 12S] for calculating long 
range order features through quantities, usually called 
susceptibilities, it is useful to define the fluctuations of 
interest before calculating the correlations between those. 
The most basic or primary fluctuations are in the deriva- 
tive of a signal (or deviations from the mean, indeed) . In 
order to enhance the role of "fluctuations" in the time 
series, i.e. the text, each series is transformed as fol- 
lows, according to the most primary set of thresholds: 
if the length of a word in LTS (or its frequency or rank 
in FTS) is smaller than the next one, the former word 
gets a value = 2; if it is greater, it gets the value = 1; 
and if both are equal. The resulting series is called Mj 
(1 < i < N — 1). Next, each Mi is cut into N s subseries 
of size s, where N s is the smallest integer in N/s. The 
ordering starts from the beginning of the text, dropping 
out the last data points if necessary. For either the orig- 
inal or shuffled text, each FTS (or LTS) has the same 
number of data points, 1 < i < N. The number of words 
gives the size of the "length time series". The number 
of different words gives the size of the "frequency time 
series" . See such values and other informative data in 
Table 1. 

Next, one calculates the probability 

P{s,v) = N (1) 

S „=l S ?=l M (^-l)s+i 

in "windows" of size v 1 for every v and s. Thereafter one 
calculates the so called partition function 

X (s,q) = ^ 1 P(s,v)« (2) 

for each s value. A power law behavior is expected 

X (s,q)~s T ^, (3) 

where r(q) plays the role of a partition function [3]. The 
generalized Hurst exponent, h(q), is obtained through 

%) = i± ^ ) - (4) 

q 

from the best linear fit to Eq. ([3| on a log- log plot to get 
r(q). The generalized fractal dimension D(q) [3 a follows 
next: 

m = ^{. (5) 

Let 

a = dr(q)/dq, (6) 

from which, by inversion, one obtains q(ct) and r(q(a)), 
whence the /(a) function [55] 

f(a) = qa - r(q), (7) 



as usual [3]. 

In the present work, x( s ;<z) nas been calculated for a 
very large s range, i.e. between 2 and 5000, but the forth- 
coming below reported data takes into account only the 
values for 2 < s < 200, i.e. when 0.3 < logio s < 2.3. In 
such a range, the error bands are undistinguishable from 
the (mean of the) data; see Figs. 3-7. Moreover, the r(q) 
values must be measured in s ranges where a power law, 
as in Eq.(3), is found. Practically, one could do better in 
letting the extremal values of the s interval be flexible, 
and, for example, let them be varied in each possible fit, 
but this is much too time consuming for the final out- 
put, the more so if one attempts to cover a large set of 
q values. The above mentioned extremal values were ob- 
tained, or rather "considered as acceptable", along the 
above criteria plus some respect of computer time, after 
many trial plots. 

The r(q) values were calculated by a linear best fit on 
a log-log plot of %(s, q) vs. s, for all (integer) q values 
such that -40 j q j 80. Note that there are about 6 x 120 
data set to fit. Thus, the number of q values examined 
was reduced to those such that -35 < q < 75, for FTS 
and to -25 < q < 80, for LTS. This allows one to obtain 
smooth curves, see below, with negligible error bars. 

Another (technical) comment, in advance of the re- 
ported results, in the following subsection, is in order: a 
too broad interval of q might sometimes cast doubts on 
reported multifractality [55] . It has been discussed that 
in the analysis of multifractality in turbulence or high- 
frequency financial data, the interesting moment orders 
q should not be greater than 8 in order to make the par- 
tition function converge [59j . However, as an example, 
the size of intraday high-frequency data is such that the 
moment order can be taken to be -120 < q <120 [55]. In 
brief, depending on the size of the time series, the par- 
tition function can be computed for rather large values 
of q, if the convergence makes sense. In other words, 
the error bars should become negligible or irrelevant for 
the discussion purpose. As in other papers on multifrac- 
tals 60-62J or critical exponent search [53] [53] by the 
authors and co-workers, great care has thus been taken 
such that the here below presented data is reliable both 
from physics and statistics criteria. No need to say that 
it takes much time to do so and all steps are not recorded. 
The fit code (multifractalma.java) is available from the 
author upon request. 

B. h(q) plots: Figs. [3]j4] 

For space savings, not all x(s,q), Eq. @, are shown 
here, as mentioned here above. However, for a prelimi- 
nary quantifying purpose, a summary of values, for q — 
2, and its standard deviation, found of the order of 10 -3 
are found in Table 2. Recall that q = 2 in fact corre- 
sponds to the standard DFA procedure. It is seen that 
the number of data points, i.e. the number of boxes of 
size s, taken in order to estimate the slope of the straight 
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line has some quite mild influence. The latter might be 
a specific effect of time series based on written texts, or 
on the preliminary transformation of the time series into 
some sort of series of fluctuations. The matter has not 
been investigated further. 

A few examples of plots of the partition function x(s, q) 
vs. the sub-series size s, see Eq. (1), are shown in Figs. 
[T][2] on log-log graphs. As explained above, the s range 
is chosen to be appropriate in order to obtain r(q), from 
Eq. (3). In each display, the raw data has been arbi- 
trarily displaced along the y-axis, for good visualization 
purpose; only the slope from a linear fit is relevant. It is 
already remarkable that the (positive or negative) slope 
values will be of the same order of magnitude for the 
different but corresponding cases, either the original or 
shuffled series, with an expected evolution, as in many 
other studies. Also it is seen that there is hope for some 
possible distinction to be made between FTS and LTS 
cases depending on the original text. 

From Eq. (4), the resulting h(q) curves of the gener- 
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alized scaling, Hurst, exponents, arc given in Figs 
for the various texts, for all (integer) q values such that 
-40 j q l 80. Observe that a marked numerical instabil- 
ity exists at q — 0, - as usual, in fact, - better seen for 
the FTS than LTS . For monofractal time series, h(q) 
should be independent of q. A multifractal structure is 
markedly observed, thus indicating that the scaling be- 
haviors of small and large fluctuations arc different. It is 
known that the generalized Hurst exponent for negative 
q can be shown to describe the scaling of small fluctu- 
ations, - because the windows v, in Eq.(l), with small 
variance dominate for this g-range. In contrast, the win- 
dows v with large variance have a stronger influence, - for 
positive q. Whence small fluctuations are usually char- 
acterized by larger scaling exponents than those related 
to large fluctuations, thereby inducing a Fermi or step 
function- like shape of h(q). 



Obviously, 



C. Note on D(q) 



D(q) = —(qh(q)-l). 



(8) 



First, observe the values of h(2). For stationary sig- 
nals, h(2) should coincide with the Hurst exponent, H, 
if the system is monofractal, and D = 2H — I. The h(2) 
values, as e.g. can be read from Figs. 1-2, are given in 
Table 3. The values for the the shuffled texts lead to 
a doubtless fractal dimension = 1. The slight deviations 
from unity for the original texts might be due to so called 
finite size effects. Recall that the topology of the time 
series is a smooth line, without gaps. 

Next, it can be deduced that the generalized fractal 
dimension for the FTS has a similar set of values for both 
english texts, decaying from ~ 1.2 to 1.0 for q increasing 
but negative; D(q) decays slowly for q positive, barely 



reaching a value 0.95 for q = 80. The value of D(q) is 
much greater along the negative q axis, in particular for 
ESPo but is identical to the other two for q > 0. In LTS, 
the form of D(q) is that to be expected and is similar to 
the FTS form. 

The shuffled texts have remarkably similar h(q), thus 
D(q) values, both in range and variations, as those of 
the original texts, but the D(q) values are closer to 1.0, 
- as could be expected. Very slightly quantitative differ- 
ences occur, - more markedly for the EPSsFTS, see Fig. 
2 (top), than for others. Along a Baeysian reasoning, 
these differences can be attributed to the finite size of 
the sample. 

By the way, [65] 



Ci 



dr(q) 



dq 



(9) 



a measure of the intermittency lying in the signal y(n), 
can be numerically estimated by measuring r q around 
q = 1. In each case, the value of C\ is close to unity (table 
of data not shown for space savings). Some comment on 
the role/meaning of Ci, a sort of information entropy on 
the structural complexity of a signal, can be found in 
Ref.lMl. 



D. /(a) plots: Figs. [S][7 



The /(a) spectra are shown in Figs. [5]-[7j Instead of 
presenting graphs based on FTS and LTS mappings, the 
data is presented for the three original texts and their 
shuffled counterpart. In so doing one can better com- 
pare for a given sample the methods and the subsequent 
results. 

Before discussing the original texts/series, it can be 
observed that the shuffling does not fully symmetrize the 
spectra. The rather finite size of these dynamical sys- 
tems is likely the cause of such an imperfection. How- 
ever, there is no doubt that all spectra are markedly non 
symmetric. This was at first found for DLA simulations 
in [5J, - with very high positive skewness, without much 
discussion. Note that for all series, the FTS curves are 
wider than the LTS. In all cases also the original and its 
shuffled series lead to a quasi identical f(a) spectrum, 
for any a < 1 and up to a ~ 1.1. Above a > 1.2, some 
departure occurs, for several series, indicating a marked 
effect of large fluctuations. 



IV. AND SOME FURTHER DISCUSSION 

Let us stress linguistics-like implications derived from 
the above time series analysis of linguistics samples: 

• h(q) and D(q): In LTS, even though the form of 
D(q) is that to be expected and is similar to the 
FTS form, it has to be stressed that the AWLo and 
ESPo are very quantitatively similar, but markedly 



8 



differ from TLGo. This already indicates that one 
can observe a high structural complexity of the au- 
thor 's style of writing through these two books. 
Moreover the multifractal analysis clearly shows 
that a translation effect on the text style is much 
better observed through an FTS than an LTS. 

Finally, it is fair to mention a reviewer remark: the 
shuffled texts have remarkably similar D(q) values. 
Does this mean that the multifractality is a distribu- 
tional one and not due to non-linear correlations? 
It could be the case indeed for the shuffled texts. 

• f(a) : the curve rises very sharply: starting from 
negative values for a < 1.0, it reaches a maximum 
(=1.0) at 1.0, at the maximum so called box di- 
mension, and decays less rapidly for a > 1. The 
not fully parabolic, to say the least, f(a) curve 
indicates non uniformity and strong LROC be- 
tween long words and small words, - evidently aris- 
ing from strong short range order correlations be- 
tween these. In fact, the left (right) hand side of 
the f(a) curve corresponds to fluctuations of the 
Q > (q < 0)-correlation function. In other words, 
they correspond to correlated fluctuations in small 
(large) word distributions. It would be a nice con- 
jecture that such distributions are personal features 
of the vocabulary grasped by an author. 

In so doing, the the extremal a values, i.e. a_ and 
a + should be quantifying the somewhat systemic 
way used by an author in his or her writings. These 
extreme values for the 12 examined texts are given 
in Table 3. Observe that the Esperanto text differs 
from both English texts in such a consideration, - 
the English texts presenting the same . 

A short final note: the Esperanto text curves be- 
have differently from the English texts in FTS, 
though TLGo is different from the others in the 
LTS case. However, the shuffled texts f{a) spec- 
tra behave in a very similar way, both qualitatively 
and quantitatively. I conjecture the effect to be 
due to the number of punctuation marks in such 
cases, see Table 1. Again, LROC and the related 
structural complexity, style and creativity, are well 
exemplified. 



a writer's style. Of course, one might argue that only 
text written by a single author, Lewis Carroll, are exam- 
ined, not proving whether the so obtained f(a) is text- 
dependent, writer-dependent, or both. That is why crite- 
ria suggested for estimating a text semantic complexity&s 
if it is a time series are of interest. It remains to be seen 
through more investigations whether the f(a) curve and 
the cascade model hold true in other cases, and do in gen- 
eral characterize authors and/or texts, - and other time 
series. Note that the multifractal method should addi- 
tionally be able to distinguish a natural language signal 
from a computer code signal [32 and should help in im- 
proving translations by suggesting perfection criteria and 
indicators of a translated text qualitative values, similar 
to those of the original one. 

Let it be re-emphasized the remarkable difference for 
the Esperanto text (Fig. 3a) with the English texts in 
the FTS analysis. Linguistics input should be searched 
at this level and is left for further discussion. The origin 
of differences between TLG and AWL needs more work 
also at the linguistic level. 

On the other hand, one physics conclusion arises from 
the above: the existence of a multifractal spectrum found 
for the examined texts indicates a multiplicative process 
in the usual statistical sense for the distribution of words 
length and frequency in the text considered as a time 
series. Thus linguistic signals may be considered indeed 
as the manifestation of a complex system of high dimen- 
sionality, different from random signals or from systems 
of low dimensionality such as the financial and geophys- 
ical (climate) signals. In so doing one can consider the 
behavior of the atypical f{ct) curve as originating from a 
binomial multiplicative cascade process as in fully devel- 
oped turbulence [30] , here for short and long words, on a 
support [0,1]. 

Extensions to higher dimensions, e.g. in image recog- 
nition [57] or in hypertext studies are thus quite possible. 
In relation to these remarks, work on fractal analysis of 
paintings should be mentioned |67l 168] , on handwriting 
[69] and on japanese garden patterns [70] to indicate di- 
rections for further research. 
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TABLE II: Characteristic slope values, for q — 2, for the 
original (o) and shuffled (s) texts, according to the type of 
series (FTS or LTS) so examined 



Original texts 


- 200 std.dev. 


200 - 5000 std.dev.. 


- 5000 std.dev. 


AWLoFTS 
ESPoFTS 
TLGoFTS 


0.491 2E-3 
0.519 2E-3 
0.501 2E-3 


0.561 2E-3 
0.544 1E-3 
0.777 3E-3 


0.561 2E-3 
0.545 1E-3 
0.774 3E-3 


AWLoLTS 
ESPoLTS 
TLGoLTS 


0.538 2E-3 
0.516 2E-3 
0.531 2E-3 


0.686 1E-3 
0.619 2E-3 
0.560 1E-3 


0.684 1E-3 
0.620 2E-3 
0.560 2E-3 


Shuffled texts 


- 200 std.dev. 


200 - 5000 std.dev. 


- 5000 std.dev. 


AWLsFTS 
ESPsFTS 
TLGsFTS 


0.525 1E-3 
0.518 1E-3 
0.524 1E-3 


0.534 1E-3 
0.474 1E-3 
0.480 1E-3 


0.533 1E-3 
0.478 1E-3 
0.480 1E-3 


AWLsLTS 
ESPsLTS 
TLGsLTS 


0.461 2E-3 
0.519 4E-3 
0.504 3E-3 


0.584 1E-3 
0.507 1E-3 
0.587 1E-3 


0.581 1E-3 
0.506 1E-3 
0.584 1E-3 



TABLE III: Characteristic h(q — 2), a_ and a+ values, 
see Figs. 1-5, for the original, translated and shuffled texts, 
according to the type of series (FTS or LTS) so examined 



Original Texts 


h{2) 


Oi — Q + 


AWLoFTS 
ESPoFTS 
TLGoFTs 


0.997 
0.997 
0.997 


0.95 1.19 
0.94 1.30 
0.95 1.19 


AWLoLTS 
ESPoLTS 
TLGoLTS 


0.994 
0.994 
0.994 


0.92 1.23 
0.92 1.21 
0.92 1.34 


Shuffled Texts 


h(2) 


a— 


AWLsFTS 
AESPsFTS 
TLGsFTS 


bob 


0.95 1.13 
0.96 1.16 
0.94 1.13 


AWLsLTS 
ESPsLTS 
TLGsLTS 


0.999 
0.999 
0.999 


0.91 1.25 
0.92 1.24 
0.91 1.25 



